Using Parsed Corpora for Estimating Stochastic Inversion Transduction Grammars
نویسندگان
چکیده
An important problem when using Stochastic Inversion Transduction Grammars is their computational cost. More specifically, when dealing with corpora such as Europarl only one iteration of the estimation algorithm becomes prohibitive. In this work, we apply a reduction of the cost by taking profit of the bracketing information in parsed corpora and show machine translation results obtained with a bracketed Europarl corpus, yielding interresting improvements when increasing the number of non-terminal symbols.
منابع مشابه
Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems. The formalism combines three tactics against the constraints that render finite-state transducers less useful: it skips directly to a context-free rat...
متن کاملLearning Translations for Tagged Words: Extending the Translation Lexicon of an ITG for Low Resource Languages
We tackle the challenge of learning part-ofspeech classified translations as part of an inversion transduction grammar, by learning translations for English words with known part-of-speech tags, both from existing translation lexica and from parallel corpora. When translating from a low resource language into English, we can expect to have rich resources for English, such as treebanks, and smal...
متن کاملA Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment
We present two contributions to grammar driven translation. First, since both Inversion Transduction Grammar and Linear Inversion Transduction Grammars have been shown to produce better alignments then the standard word alignment tool, we investigate how the trade-off between speed and end-to-end translation quality extends to the choice of grammar formalism. Second, we prove that Linear Transd...
متن کاملStochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistic...
متن کاملTranslation as Linear Transduction Models and Algorithms for Efficient Learning in Statistical Machine Translation
Saers, M. 2011. Translation as Linear Transduction. Models and Algorithms for Efficient Learning in Statistical Machine Translation. Acta Universitatis Upsaliensis. Studia Linguistica Upsaliensia 9. 133 pp. Uppsala. ISBN 978-91-554-7976-3. Automatic translation has seen tremendous progress in recent years, mainly thanks to statistical methods applied to large parallel corpora. Transductions rep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008